The objective of this project is to analysis out where the free/open hotspots and roadway sections are. We are able to identify the locations and routes that have the best internet infrastructure for testing new services as a consequence. My use case will be Koln.
If you have access to the roadway sections with the necessary internet infrastructure, it will be easier to choose the sites for new services like speed monitoring, traffic light smart atomation, or any IOT application linked to smart cities. With the help of this research, it will be possible to test new technologies without worrying about setting up the required infrastructure.
import pandas as pd # For handling tables
import re # For data preparation
import haversine as hs # For distance calculations
from haversine import Unit # For distance calculations
import plotly.io as pio # For poltting corrdinates over maps
import plotly.express as px # For poltting corrdinates over maps
import copy
Our Dataset consists of two tables :
| Table name | Short discription |
|---|---|
| Hotspots in Köln | List of urban hotspots in Köln. In addition to some information regarding this hotspots like the geometrical location , the house number and street name. |
| Road sections Köln | The Köln street directory provides an overview of all applicable street names , addresses , geometry paths and house numbers in each road section |
hotspots_in_koeln_df = pd.read_sql_table('hotspots_in_koeln', 'sqlite:///../data/project.sqlite')
road_sections_df = pd.read_sql_table('road_sections','sqlite:///../data/project.sqlite')
To get an overview of the Dataset, i will visualize an overview of our tables. This overview contain the most important infromation in each table
pio.renderers.default = "notebook"
fig = px.scatter_mapbox(hotspots_in_koeln_df,
lat="Geometry_y",
lon="Geometry_x",
hover_name= "Haus Nr",
hover_data=["Straßenname", "Haus Nr"],
color="Straßenname",
zoom=11,
height=1400,
width=1200)
fig.update_layout(mapbox_style="open-street-map")
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()
road_vis_df = copy.deepcopy(road_sections_df)
road_vis_df['Gemotry_path_list'] = road_vis_df['Geometry_paths'].map(lambda x: [list(map(float , element.split(','))) for element in re.findall(r"[.^[]*\[([^]]*)\]", x)] )
road_vis_df = road_vis_df.explode('Gemotry_path_list')
road_vis_df[['Geometry_x','Geometry_y']] = pd.DataFrame(road_vis_df['Gemotry_path_list'].to_list())
pio.renderers.default = "notebook"
fig = px.scatter_mapbox(road_vis_df,
lat="Geometry_y",
lon="Geometry_x",
hover_name= "Straße",
color="Straße",
zoom=11,
height=1400,
width=1200)
fig.update_layout(mapbox_style="open-street-map")
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()
We need to prepare the data before we proceed with our solutions to the data science challenge.
In our database, the street names serve as the primary key. Unfortunately, the tables don't have standardized street names; for instance, they may finish in XYZ str, XYZ strasse, or other variations. Due to this, merging the two tables and processing the data to find a solution is challenging.
So , the steps of data preparation :
# Making standard street names to match with the street names from the roads table
hotspots_in_koeln_df['Straße_name'] = hotspots_in_koeln_df['Straßenname'].map(lambda x: re.split('[s|S]tr[ |a|"."]', x)[0])
# Making standard street names to match with the street names from the hotspot table
road_sections_df['Straße_name'] = road_sections_df['Straße'].map(lambda x: re.split('[s|S]tr[ |a|"."]', x)[0])
# Merging the hotspots and the road section tables
all_in_one = hotspots_in_koeln_df.merge(road_sections_df , on='Straße_name')
all_in_one['houses in the intersection'] = all_in_one[['Hausnummer von links' , 'Hausnummer bis links' ,'Hausnummer von rechts' , 'Hausnummer bis rechts' ]].values.tolist()
We filter the road section that has house number that directly have a hotspot
interesections_with_hotspots = []
for i , row in all_in_one.iterrows():
try :
# Check if the house number are already one of the houses in the intersection
if row['Haus Nr'].zfill(4) in row['houses in the intersection']:
interesections_with_hotspots.append(row)
except:
pass
interesections_with_hotspots_df = pd.DataFrame(interesections_with_hotspots)
# Combine the geometry coordinates of the hotspot in one coloum
all_in_one['hotspot_geometry'] = all_in_one.apply(lambda x: list([x['Geometry_x'],x['Geometry_y']]) ,axis=1)
# make the geomtry paths list of floats instead of string
all_in_one['Gemotry_path_list'] = all_in_one['Geometry_paths'].map(lambda x: [list(map(float , element.split(','))) for element in re.findall(r"[.^[]*\[([^]]*)\]", x)] )
interesections_with_range_of_hotspots = []
for i , row in all_in_one.iterrows():
loc1 = row['hotspot_geometry']
for loc2 in row['Gemotry_path_list']:
# check if the one of the intersection paths is in the range of the hotspot
if hs.haversine(loc1 , loc2,unit=Unit.METERS) < 45:
interesections_with_range_of_hotspots.append(row)
interesections_with_range_of_hotspots_df = pd.DataFrame(interesections_with_range_of_hotspots)
pio.renderers.default = "notebook"
fig = px.scatter_mapbox(interesections_with_hotspots_df,
lat="Geometry_y",
lon="Geometry_x",
hover_name= "Haus Nr",
hover_data=["Straßenname", "Haus Nr"],
color="Straßenname",
zoom=11,
height=1400,
width=1200 )
fig.update_layout(mapbox_style="open-street-map")
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()
pio.renderers.default = "notebook"
fig = px.scatter_mapbox(interesections_with_range_of_hotspots_df,
lat="Geometry_y",
lon="Geometry_x",
hover_name= "Haus Nr",
hover_data=["Straßenname", "Haus Nr"],
color="Straßenname",
zoom=11,
height=1400,
width=1200 )
fig.update_layout(mapbox_style="open-street-map")
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()
To conlude our findings , we have found 41 locations form the first solution and 211 locations form the second solution , which is more flexiable, that can be used as testing location for the IOT applications in Koln.
We are not running the data pipeline here becasue our data has been changed in the mobilithek. This change caused a drop in some coloums that i was planning to use in my analysis Therefore , i used a local version which i have downloaded beforehand.